Driven Langevin dynamics: heat, work, and shadow work 
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Common algorithms for simulating Langevin dynamics are neither microscopically reversible, nor 
do they preserve the equilibrium distribution. Instead, even with a time-independent Hamihonian, 
finite time step Langevin integrators model a driven, nonequilibrium dynamics that breaks time- 
reversal symmetry. Herein, we demonstrate that these problems can be properly treated with a 
Langevin integrator that splits the dynamics into separate deterministic and stochastic substeps. 
This allows the total energy change of a driven system to be divided into heat, work, and shadow 
work - the work induced by the finite time step. Through the interpretation of a discrete Langevin 
integrator as driving the system out of equilibrium, we can bring recent developments in nonequi- 
librium thermodynamics to bear. In particular, we can invoke nonequilibrium work fiuctuation 
relations to characterize and correct for biases in estimates of equilibrium and nonequilibrium ther- 
modynamic quantities. 
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I. INTRODUCTION 

We are concerned with simple numerical integrators 
for Langevin dynamics ^, 



dr = V dt 



dt — 7v dt 
m 




(la) 
(lb) 



where the system is driven from equilibrium by a time- 
dependent Hamiltonian T-Lit). Here r and v are time- 
dependent position and velocity, m is mass, / is force, 
/3 = l/fceT, fee is Boltzmann's constant, T is the tem- 
perature of the environment, 7 is a friction coefficient 
(with dimensions of inverse time), and W(i) is a stan- 
dard Wiener process. The force is determined by the 
derivative of the potential energy, / = — dT-L/dr. For 
multi-dimensional, multi-particle systems, r, u, /, and 
dW are vectors, and m is a diagonal matrix. 

In order to simulate Langevin dynamics on a digital 
computer it is necessary to adopt some approximate al- 
gorithm that divides time into discrete steps [5]. How- 
ever, most such schemes have an inherent problem: even 
with a time-independent Hamiltonian, they do not pre- 
serve the canonical equilibrium distribution for % nor do 
they satisfy microscopic reversibility. (By reversibility we 
mean that the probability of sampling a particular trajec- 
tory starting from equilibrium is equal to the probability 
of sampling the trajectory's time reversal, reversing ve- 
locities if necessary.) We show that these pathologies 
arise because discrete time step integrators of Langevin 
dynamics can be viewed as simulating driven nonequilib- 
rium dynamics. This perspective has the advantage that 



the complications generated by this unwanted but in- 
evitable breaking of time-reversal symmetry can be reme- 
died |3iri6j with insights from nonequilibrium statistical 
thermodynamics . 

We can appreciate some of the problems inherent in 
finite time step Langevin dynamics by first considering 
the zero friction limit, 7 = 0, with a time-independent 
Hamiltonian, where Langevin dynamics reduces to de- 
terministic Newtonian dynamics. A simple, popular in- 
tegrator for Newtonian dynamics is the Velocity Verlet 
algorithm 1710, 
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Due to the finite time step, the trajectories generated by 
this algorithm are inaccurate: they do not faithfully fol- 
low the precepts of Newtonian mechanics, and the actual 
energy of the system is not conserved, but rather fluctu- 
ates from one time step to the next. However, the Veloc- 
ity Verlet integration scheme is symplectic (the Jacobian 
of the transformation from old to new positions and ve- 
locities is unity, and therefore the phase space volume 
is conserved [5|), which ameliorates some problems due 
to finite time step: although a finite time step symplec- 
tic integrator does not conserve the energy of the system 
Hamiltonian, it does conserve the energy of a shadow 
Hamiltonian, which is close to the desired Hamiltonian if 
the time step is not too large [2 [TU] . This prevents long 
term drift in the energy. 

Essentially, a finite time step dynamics performs work 
on the system, over-and-above any work due to inten- 
tional perturbations from a time-dependent Hamilto- 
nian [6] . We can imagine this finite time step integration 
scheme in the following way. At the beginning of each 



time step, we first perturb the system Hamiltonian to the 
shadow Hamiltonian, changing the energy of the system. 
The symplectic integrator then updates the position and 
velocity, Eq. ([2]), perfectly preserving the shadow-energy 
of the shadow Hamiltonian. We then switch back to the 
original Hamiltonian, again perturbing the energy. The 
net change in energy of the system during this time step 
is due to work performed on the system by perturbing 
back and forth between the system and shadow Hamilto- 
nian. We can determine this shadow work (also known as 
error work [5] or an effective energy change [11,) during 
each time step by measuring the difference in energy us- 
ing the system Hamiltonian, so we do not need to know 
the form of the shadow Hamiltonian. This shadow work 
is distinct from any protocol work applied to the system 
due to explicit, time-dependent perturbations of the sys- 
tem Hamiltonian. Note that Markov chain Monte Carlo 
simulations do not generate shadow work [12] because 
the dynamics is explicitly detailed balanced: this ensures 
that the trajectories are microscopically reversible [T5] . 
and that the appropriate equilibrium ensemble is pre- 
served for a time-independent Hamiltonian [5]. 

Discretizations of continuous time Langevin dynam- 
ics are essentially a combination of deterministic and 
stochastic dynamics, and suffer from a combination of 
problems as a result. With a finite time step the deter- 
ministic parts of the dynamics tend to pump energy into 
the system in the form of shadow work, driving the sys- 
tem away from equilibrium, whereas the stochastic parts 
of the dynamics relax the velocities back toward the equi- 
librium Maxwell-Boltzmann distribution, removing en- 
ergy from the system in the form of heat. It follows that, 
even for a system with a Hamiltonian that is explicitly 
time-independent, a finite time step Langevin dynam- 
ics has an effective Hamiltonian alternating between the 
system Hamiltonian and the shadow Hamiltonian, and 
thus actually simulates a driven, nonequilibrium system, 
with a net energy flow. Microscopic time-reversal sym- 
metry is broken, and in general we can not determine the 
steady-state, nonequilibrium distribution. These difhcul- 
ties may be circumvented by utilizing a sufhciently small 
time step, but this reduces the accessible time scales and 
is hardly a satisfactory resolution of the problem. 

This interpretation of a discrete Langevin integrator 
in terms of a driven thermodynamic process is the main 
point of the paper. From this perspective, we can invoke 
a wide array of nonequilibrium work fluctuation relations 
to characterize and correct for biases in estimates of equi- 
librium and nonequilibrium thermodynamic quantities. 



II. CONCRETE INTEGRATOR 

To further confound matters, many common algo- 
rithms for Langevin dynamics do not cleanly separate 
the deterministic and stochastic substeps, which compli- 
cates the analysis. We rectify this and the previously- 
discussed complications of Langevin simulations by ex- 



ploiting an integration scheme that is explicitly time- 
symmetric, that cleanly separates the stochastic and de- 
terministic parts of the dynamics, and for which the de- 
terministic parts are symplectic and the stochastic parts 
are detailed balanced. This allows a clean separation 
of the system's energy change into work, shadow work, 
and heat, simplifying our analysis in terms of a driven 
nonequilibrium process. Fortunately, integrators with 
these properties have received recent attention [H El ITH - 
[16] . As a concrete example, we consider the integrator 
of Bussi and Parrinello [TT], where we make the Hamil- 
tonian update explicit: 
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Here, At is the time step by which the simulation clock 
is advanced, f{n) is the force at position r{n) due to 
the Hamiltonian H(n), a — exp(— 7At), and A/"^ and 
A/"" are independent, normally distributed random vari- 
ables with zero mean and unit variance (hence, when 
scaled by {(3m)~-^/^, distributed according to the equi- 
librium Maxwell-Boltzmann velocity distribution). The 
first and last substeps ( 3a|3g l are stochastic, Marko- 
vian, and detailed-balanced (with respect to the canoni- 
cal measure) velocity randomizations, which leave the po- 
sition unchanged. The five middle substeps ( 3b][3f| ) con- 
stitute the deterministic Velocity Verlet integrator (l2|, 
with the midpoint Hamiltonian update made explicit. 
The order of substeps and effective Hamiltonian switches 
are illustrated in Fig. [l] Note that the substep pairs 
( 3b|3c ) and ( 3e|3f ) are both individually symplectic. 



III. NONEQUILIBRIUM THERMODYNAMICS 

The total change in energy AE during the step n — >■ 
n + l can be cleanly separated into heat Q, protocol work 
Wprot, and shadow work M^shad: 
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FIG. 1. Timeline f or t he Bussi-Parrinello integrator ([st . The 
stochastic substep (Sal randomizes the velocity, transferring 



heat between the system and environment, while the Hamilto- 
nian is fixed and position unchanged. We then switch from the 
system to the shadow Hamiltonian, performing shadow work 
on the system. Substeps ( 3b I and (|3c[) update the velocity 



and then position according to the symplectic dynamics of the 
shadow Hamiltonian, exactly conserving the energy. We then 
switch back to the system Hamiltonian (performing shadow 
work), and in (3d I update the system Hamiltonian from T-iiji) 
to H(n + 1), according to the prescribed protocol A. This ac- 
tion performs protocol work on the system. We switch back to 
the shadow Hamiltonian (doing shadow work), symplectically 
update position and then velocity ( 3e|3f I, then restore the sys- 
tem Hamiltonian (again performing shadow work) . Finally we 
conclude with another velocity randomization substep (|3p 



Here, AEa"g are the energy changes during the corre- 
sponding substeps of Eq. ([3|. Heat is the energy ex- 
changed with the thermal environment, protocol work 
is the energy change due to deliberate manipulation of 
the Hamiltonian, and shadow work is the energy change 
due to the finite time step of the symplectic part of 
the integrator. The essential distinction between heat 
and work is that heat flow is change of the system en- 
ergy due to change in the current distribution over mi- 
crostates, whereas work is change of energy due to change 
in the equilibrium distribution over microstates. The 
former contribution, the work, can be due to either ex- 
plicit change of the time-dependent system Hamiltonian 
(protocol work) or alternation between the system and 
shadow Hamiltonians (shadow work). The evaluations of 
protocol work, heat and shadow work are illustrated in 
Fig.[l] 



A central relation of driven, nonequilibrium thermody- 
namics relates the microscopic irreversibility of trajecto- 
ries to the work W[X, A] performed on the system during 
the forward protocol p!7Hl9| . 
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Here, X is a trajectory through phase space between 
time and NAt, A represents a protocol for perturb- 
ing the system (typically through the time-dependence 
of the system Hamiltonian), Ai^cq[A] is the free energy 
difference between the equiUbrium distributions for the 
initial and final values of the system Hamiltonian, and 
P[ A I a] is the probability of the trajectory, given the 
protocol and an initial equilibrium ensemble. The time- 
reversed protocol A (time-reversed trajectory X) retraces 
the same series of perturbations (phase space transitions) 
as the forward protocol A (forward trajectory X), but 
under time-inversion and hence in reverse. Subject to a 
protocol, a driven system is microscopically reversible if 
the probability of a trajectory and its time-reversal are 
identical, and therefore the work imposed by the protocol 
equals the free energy change [20] . 

It is straightforward to extend this relation to mixed 
stochastic-deterministic dynamics, such as the Bussi- 
Parrinello integrator, Eq. (Is]). Here, the total work 
W = X)„ W^^"^ is the sum of the contributions W^"'' from 
individual steps. The stochastic velocity randomization 
substeps obey Eq. (Is]) since they are balanced, in that 
they preserve the canonical equilibrium distribution. The 
set of deterministic Velocity Verlet substeps also obeys 
Eq. ^L so long as the total work includes the shadow 
work 10 [5] , since the dynamics is symplectic and micro- 
scopically reversible with respect to the shadow Hamilto- 
nian [31 (TO] . Since both the deterministic and stochastic 
substeps are Markovian, it follows that we can safely in- 
termix the two dynamics, and ((sl) will still hold. It there- 
fore follows that Bussi-Parrinello obeys various derived 
relations of nonequilibrium statistical dynamics, such as 
the Jarzynski equality [5T] , fluctuation relations [T71 [22] , 
interrelations between path ensemble averages [HI [53] 
and various interrelations between dissipation and time 
asymmetry |24fl27| . Furthermore, by its separation of 
protocol work and shadow work, Bussi-Parrinello per- 
mits the separation of the respective contributions to mi- 
croscopic irreversibility of deliberate perturbation (phys- 
ically meaningful) and the finite time step (a discretiza- 
tion artifact). Notably, the statistics of the protocol work 
alone systematically deviate from those of the total work, 
and hence lead to biased inference when using the ma- 
chinery of nonequilibrium thermodynamics. In j |VI| we 
explicitly demonstrate this underappreciated point. 



IV. PERTURBATION FROM EQUILIBRIUM 

Given the widespread use of finite time step mixed 
stochastic/deterministic dynamical simulations in chem- 



istry, biology, and physics, a question of significant prac- 
tical interest presents itself: How far from equilibrium is 
the effective nonequilibrium steady state induced by this 
time discretization for a system with time-independent 
Hamiltonian? It is common practice to monitor some 
essentially arbitrary, yet easily measured, observable of 
the system, such as the total energy. However, we can 
exploit recent advances in nonequilibrium statistical dy- 
namics to provide a principled characterization of how 
far the system is driven from equilibrium [28j . 

The natural measure of this instantaneous distance the 
system has been driven away from equilibrium is the 
difference between a nonequilibrium free energy |29[ 130] 
Fncq = {E) — TS and the corresponding equilibrium free 
energy for the given Hamiltonian %. If the Hamiltonian 
were held constant and the (previously driven) system 
were allowed to relax to equilibrium, this represents the 
heat that would be lost to the environment, or equiva- 
lently the maximum work that could be imparted to a 
mechanically-coupled system. For the perturbations im- 
posed by the discrete dynamics, this nonequilibrium free 
energy is approximated near equilibrium by |28j 
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where Wshad is the shadow work over the whole simula- 
tion, T^ss is the power (work per unit time) once transients 
have died off and the system has settled into a nonequi- 
librium steady state, and tf — ti is the total simulation 
time. Normalizing this nonequilibrium free energy by the 
size of the system (number of degrees of freedom) pro- 
vides a natural measure of how far from equilibrium each 
degree of freedom is on average. 

To estimate the nonequilibrium steady-state free en- 
ergy for a molecular system, we simulated cubic boxes 
of TIP3P waters of various sizes, both with and without 
constraints (see the Appendix for simulation details) . Ini- 
tial coordinates and momenta were sampled from equi- 
librium in an NPT ensemble at 1 atm and 298 K us- 
ing the generalized hybrid Monte Carlo (GHMC) inte- 
grator [16l [31] . These initial conditions were simulated 
for M steps with the Bussi-Parinello integrator (Eq. (Is])) 
at constant volume (using a collision rate 7 = 9.1/ps) to 
measure the nonequilibrium work to reach steady-state, 
followed by an additional M steps to measure the steady- 
state power. It was determined that, for all systems and 
time steps simulated, M = 1028 steps was sufficient to 
reach steady-state (see Fig. |4]). 

Because the system (a periodic water box) is homo- 
geneous, it is possible to collapse all system sizes onto 
universal curves describing the nonequilibrium free en- 
ergy difference per molecule as a function of time step 
for unconstrained and constrained systems, respectively 
(Fig. pi) . For the unconstrained system, whose numerical 
integration becomes unstable beyond Ai = 1.5 fs, the 
nonequilibrium free energy difference AFncq rapidly rises 
as the time step surpasses the typical time step employed 
for flexible systems, Ai w 1 fs. For a system of 220 wa- 
ters, for example, AFncq = 11.4 ± 0.2 ^bT at At = 1 fs. 
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FIG. 2. Nonequilibrium free energy for TIP3P water boxes, 
normalized by number of waters. Nonequilibrium free ener- 
gies for various system sizes (220 to 3520 TIP3P waters) are 
shown for both unconstrained (left curve) and constrained 
(right curve) simulations, normalized by the number A^H20 of 
waters in the system and the thermal energy ksT. Gray lines 
show empirical fits of the form a ■ At*, with a = 1.23 ■ 10~^ for 
unconstrained simulations and a — 9.97- 10~^ for constrained 
simulations. 



For constrained water boxes, however, Ai^noq reaches this 
magnitude only at large time steps — here, At « 5 fs, not 
far from the stability limit at 6 fs and well beyond 2 fs, 
the standard time step for biomolecular simulations. 

Empirically, the nonequilibrium free energy difference 
Ai^ncq for both unconstrained and constrained systems 
appears to show a quartic dependence on the time step 
At (Fig. [2J gray hues), such that 



AK, 



,-At^ 



(7) 



where the prefactor o depends strongly on whether con- 
straints are employed (see caption of Fig. [2]). This is 
consistent with earlier work observing the strong de- 
pendence of Mctropolization acceptance probabilities on 
time step [32) and highlights how small reductions in 
time step can rapidly reduce the deviations of the sam- 
pled steady-state distribution from the desired equilib- 
rium distribution defined by the system Hamiltonian 
Pcq{x) oc exp[-/3'H(a;)]. 



V. MULTIVARIATE FLUCTUATION 
THEOREM 

Is there a correlation between the shadow work (per- 
formed by integration) and the protocol work (due to ex- 
plicit Hamiltonian changes), or are these work-generating 
processes effectively uncoupled? Cast in a more practi- 
cal sense, what is the effect of shadow work on the dis- 
tribution of protocol work, or how does the presence of 



shadow work affect the symmetry [Eq. (Is])] that proto- 
col work would satisfy in its absence? Analysis of this 
question requires the generalization of work fluctuation 
theorems to the context of two sources of work. These 
results, though formulated specifically for our situation 
of explicit and artifactual work, are entirely general to 
the situation of any two sources of work. Rearrangement 
of Eq. ([5]) and splitting of the work into two distinct work 
contributions Wi , W2 gives 



P[X\A] ^ P[X\A]e 
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Multiplication by delta functions of the two works 
5iWi[X,A]-Wp,ot)S{W2[X,A]-Wsk^d), and integration 
over all trajectories produces what we will refer to as the 
Multivariate Fluctuation Theorem, 



^A(W^prot,W-^shad) 
-fA(^^prot,-Wshad) 



„/3(M^prot + W,had-AF„q) 



(9) 



This gives an expression in terms of the excess work 
Wprot + W^shad — AFgq for the ratio of the joint probabil- 
ity distributions over protocol and shadow works realized 
during the forward and reverse protocols, respectively. 

Eq. ^ is trivially extended to arbitrary decomposi- 
tions of the total work, where each component corre- 
sponds to a group of individual work steps. It thus 
represents a generalization of the work fluctuation theo- 
rem [18] to contexts with multiple sources of work. From 
Eq. ^ several other modified fluctuation theorems can 
be derived that modify a standard fluctuation theorem 
for one of the works with an exponential average over 



the other work. For example, in { VI we derive a Jarzyn- 
ski equation modified due to the presence of shadow 
work (12), and in ^VII we derive a similarly modified 



integrated transient fiuctuation theorem ( 15 ) 



VI. RECOVERING EQUILIBRIUM STATISTICS 

Equipped with our new interpretation of finite time 
step Langevin dynamics as a driven nonequilibrium 
process even in the absence of an explicit driving 
force, nonequilibrium thermodynamics affords various 
approaches for recovering equilibrium properties of the 
system. 

One approach is to maintain the simulation at equi- 
librium by incorporating Monte Carlo moves that con- 
ditionally accept or reject candidate trajectory segments 
or single time steps, for example by using the Metropolis 
criterion Paccopt = min(l,exp{-/3Wshad}) [SS]- In order 
to maintain detailed balance, the velocity must be in- 
verted if the proposed state is rejected [12;, which may 
lead to increased correlation times. Applied to single 
time steps, this is essentially the idea behind generalized 
hybrid Monte Carlo (GHMC) [UllaJ, and when appHed 
to trajectory segments, this is the idea behind work-bias 
Monte Carlo [3S] and Nonequilibrium Candidate Monte 



Carlo (NCMC) [3S]. In either case, Metropolization re- 
sults in an MCMC process that samples the true equilib- 
rium distribution. 

Another approach to recovering equilibrium statistics 
is to perform a Monte Carlo sampling of trajectories [371 
l38] . generating an ensemble of trajectories weighted by 
the Boltzmann-weighted work over the entire trajectory, 
exp{— /3iyshad}- This allows both accurate equilibrium 
statistics and realistic dynamics, albeit at a potentially 
high computational cost. 

Instead of sampling equilibrium trajectories, we can 
alternatively apply nonequilibrium relations, such as 
the Jarzynski equality |21j and path ensemble aver- 
ages [18, .231 ISni SO] , to directly recover equilibrium prop- 
erties from the statistics of a driven system, essentially 
by reweighting trajectories by exp{— /3T4^}, where it is 
important that the work includes both the protocol work 
and shadow work. Note that the initial configurations 
must be sampled from the correct equilibrium ensemble, 
which can be accomplished with standard Markov chain 
Monte Carlo, or with one of the approaches discussed 
above, such as generalized hybrid Monte Carlo. 

We demonstrate the importance of including the 
shadow work by using the Jarzynski equality to esti- 
mate free energy changes in a simple model system. The 
Jarzynski equality reads [H] 






(10a) 
(10b) 



where in the second line we have explicitly split the ef- 
fective thermodynamic work into protocol and shadow 
work. Here, angled brackets with subscript A indicate 
expectations over trajectories starting in the equilibrium 
distribution for the initial value of the Hamiltonian H(0) 
and integrated according to p| with the Hamiltonian 
evolving according to A. Though standard Langevin in- 
tegrators are used in myriad multi-dimensional contexts, 
we examine the shadow work contribution in a simple 
one-dimensional system to suggest the ubiquity of the 
issues raised here. In particular, we consider a parti- 
cle in thermal contact with the environment, subject to 
a quartic potential that is initially stationary and then 
translated at a constant velocity. The exact free energy 
change is zero. When one uses only the protocol work 
(neglecting the shadow work), the Jarzynski free energy 
estimate empirically shows a systematic error that scales 
roughly as Ai^. Using the total thermodynamic work (in- 
cluding the shadow work) eliminates this error, and the 
Jarzynski estimator gives the correct free energy change. 
This effect of neglecting shadow work on the free energy 
estimate is illustrated in Fig. |3] 

We can understand the origin of this error by analyzing 
our estimator in terms of the multivariate fluctuation the- 
orem [Eq. (|9|] derived above in SJVJ Rearranging Eq. (|9|, 
decomposing the joint probability into the marginal and 
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FIG. 3. Ignoring shadow work in Langevin simulations leads to systematic errors in inference of both equilibrium and 
nonequilibrium statistics. Results are from Bussi-Parrinello simulations of 10* independent realizations of a quartic poten- 
tial U = j{x — Xmin)'^ Uniformly translating with velocity 1/2, starting from equilibrium, with unit temperature, mass, spring 
constant, and fricti on c oefficient. Standard errors are smaller than symbol size, (a) Error in free energy calculated from 
Jarzynski equality (10a I as a function of position of the quartic potential, neglecting shadow work (o) and including shadow 
work (x), for At of 1/4 (red) and 1/8 (blue). The exact free energy change is zero, (b) Semi-log plot of error in Jarzynski free 
energy estimate after the quartic potential has moved to r = 2.5, as a function of time step length, neglecting shadow work 
(o) and including shadow work (x). (c) Ratio of left-hand side and right-hand side of integrated transient fluctuation theorem 
(ITFT, Eq. (13l) as a function of position of the quartic potential, neglecting shadow work (o) and including shadow work 
(x), for Af of 1/4 (red) and 1/8 (blue), (d) Semi-log plot of ITFT ratio after the quartic potential has moved to r = 2.5, as a 
function of time step length At, neglecting shadow work (o) and including shadow work (x). 



conditional probabilities, 

-PA(-M^prot,-l^shad) = -PA(-^prot)PA(- VFshad | " Wprot ) , 

(11) 

and integrating over the shadow work, we find that when 
ignoring the contributions of shadow work, the Jarzynski 
estimator of the free energy fHAFcq = — In (e~'^^p"') . 
has a systematic bias from the true free energy change 
/3AFoq that is a function of the distribution of shadow 
works: 



The correction factor 7 = (e"'^'^"'""^)^ is analogous to 
the correction factor that appears in the Jarzynski equal- 
ity with feedback [41]. Curiously, the correction to the 
Jarzynski estimator is solely a function of the shadow 
work distribution, and in particular does not explicitly 
depend on correlations between the shadow work and 
protocol work. 



/?Ai^eq = /3AFeq - ln(e- 



^/JW.h, 



(12) 



VII. RECOVERING NONEQUILIBRIUM 
STATISTICS 

In Fig. |3] we show that ignoring the contribution of 
shadow work leads to systematic errors in estimates of 
nonequilibrium quantities: the system is actually sub- 
ject to a different perturbation than that of the explicit 
time-dependent Hamiltonian, and thus the probability 
distribution of protocol works does not obey the relevant 
time- reversal symmetry ([5]) . We quantitate this by exam- 
ining the integrated transient fluctuation theorem |42j . 
which relates the probability of a negative work to the 
exponentially-weighted average work, conditional on the 
work being positive: 



PjWtot < 0) 
PiWtot > 0) 



(e-^'^-') 



Wtat>a 



(13) 



This relation follows directly from Eq. ^. When ignor- 
ing the shadow work and only measuring protocol work, 
a similar manipulation of Eq. (|9| produces 



^(^prot < 0) ^ ^^-pw,^^ 



M/prot>0 



^(l^prot > 0) 

The ratio of the left-hand side to the right-hand side 



(14) 



P{Wp,ot < 0) 
P{Wp,ot > 0) 



/< 



e /ST^p^otX 



w„ 



t>0 



(15) 



departs from unity to the extent that the system work 
fluctuations do not obey the relevant time-reversal sym- 
metry. Thus using the protocol work rather than the 
total work will produce systematic biases in estimators 
of nonequilibrium quantities (such as the nonequilib- 
rium free energy |28j or the nonequilibrium energetic ef- 
ficiency [43]). 

While we focus here on work distributions, we note 
that discrete integrators can also introduce artifacts into 
other aspects of a system's dynamical evolution, for ex- 
ample producing erroneous free particle diffusion coef- 
ficients and uniform force field terminal drifts. These 
artifacts can be mitigated through timestep rescaling, as 
discussed in Ref . |44j . Where measurements of work and 
heat are not required, correct statistics of nonequilibrium 
trajectories through phase space can be recovered using 
the Metropolis- Adjusted Geometric Langevin algorithm 
(MAGLA) of Bou-Rabee and Vanden-Eijnden, which un- 
der reasonable conditions on the potential energy is path- 
wise convergent to the distribution of trajectories for the 
continuous equations of motion ^45j . 



VIII. EPILOGUE 

For Hamiltonian dynamics, a finite time step sym- 
plectic integrator conserves a shadow Hamiltonian and 
is microscopically reversible. But, as we have seen, for 
Langevin dynamics, discretization of the dynamics leads 



(even for a time-independent Hamiltonian) to a mixed 
deterministic-stochastic nonequilibrium dynamics, which 
preserves the equilibrium distribution of neither the sys- 
tem nor the shadow Hamiltonian, and which is not time- 
reversal symmetric. However, we can measure the work, 
heat, and shadow work, and thereby separate the respec- 
tive contributions to time-reversal symmetry breaking of 
the finite time step and deliberate perturbation. This is 
most easily achieved by utilizing an integration scheme 
that cleanly separates the deterministic and stochastic 
parts of the dynamics, carefully accounts for deliberate 
system perturbations, and for which the stochastic parts 
are balanced and the deterministic parts are symplectic. 
This allows us to apply results from nonequilibrium ther- 
modynamics and recover equilibrium properties of the 
system, even with large time steps, and offers additional 
insight into how the error made by finite time step inte- 
gration can be quantified. 
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Appendix: Simulation details 

Simulations were carried out using the OpenMM GPU- 
accelerated molecular simulation toolkit [46j, develop- 
ment revision r3314. Cubic water boxes of various sizes 
(220, 440, 880, 1760, and 3520 waters) were created using 
the OpenMM "Modeller" tool, and parameterized with 
TIP3P water |17| using the OpenMM "Forcefield" tool. 
Constraints in constrained simulations were enforced us- 
ing the analytical SETTLE algorithm pS]. Lennard- 
Jones interactions were truncated at 9 A, and an ana- 
lytical long-range dispersion correction [49] was added to 
account for interactions beyond this cutoff. Electrostatics 
were handled using the reaction-field algorithm |50| with 
an identical cutoff using an exterior dielectric of 78.5. 



Initial configurations and momenta were sampled from 
an equilibrium NPT ensemble at 1 atm and 298 K 
with the generalized hybrid Monte Carlo (GHMC) al- 
gorithm [ini [31] using a 0.5 fs time step. A Monte 
Carlo molecular-scaling barostat with a proposal size au- 
tomatically determined during equilibration was used for 
pressure control (51] [SI]. After initial equilibration for 
250,000 steps, configurations and momenta were sam- 
pled every 10,000 GHMC steps and subjected to Bussi- 
Parrinello simulation [Eq. p|)] at fixed volume using a 
collision rate of 9.1/ps. These initial conditions were in- 
tegrated for a total of 4096 steps using a variety of differ- 
ent time steps from 0.25 fs to 7 fs, with the accumulated 
shadow work after 2" steps stored (n = 0, 1, . . . , 12). The 
limit of stability was determined by the largest time step 
that did not generate infinite cumulative work values in 
4096 time steps in any sample, and was determined to be 
2 fs for unconstrained simulations and 6 fs for constrained 
simulations. 

To estimate using Eq. (rol the nonequilibrium free en- 
ergy of the steady-state ensemble sampled by discrete 
Langevin integration, the average accumulated shadow 
work after M Bussi-Parrinello steps was used as the work 
to switch into steady-state, while the average dissipated 
power in the next M steps was used as an average steady- 
state power: 



^-^^ ncq 2 



(W^o^a/)ghmc - (W^m^2m)ghmc 



Here the (•)ghmc notation denotes averages computed 
over Bussi-Parrinello simulations initiated from GHMC- 
sampled initial configurations and momenta. Through 
analysis of M = 2" for n = 0, 1, . . . , 11, we found that 
the steady-state power, and hence the estimated nonequi- 
librium free energy, converged after M — 1024 steps [see 
Fig. [4], so we used this value for all subsequent analysis. 



The squared uncertainty in the nonequilibrium free en- 
ergy was estimated as 



<52(AFneq) = [var (Wq^m) + var {Wm^2m) 

-2 cov {Wq-^m,Wm^2m) 1 /(4iVeff ) (A.2) 



(A.l) 



where var (x) and cov (x, y) denote sample variances and 
covariances over the measured set of work values, and 
iVcff is the effective number of uncorrelated samples after 
accounting for the statistical inefficiencies by autocorre- 
lation analysis of sequentially-sampled trajectory work 
values (see Section 2.4 of [53]). 
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